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Abstract 


Three  methods  are  proposed  for  estimation  of  the  parameters  of  an  autoregressive 
process  of  order  p  with  missing  observations.  These  methods  are  based  on  the 
maximum  likelihood  approach  and  use  the  EM  algorithm,  the  Bewton-Raphson  method 
and  the  method  of  scoring,  which  are  applied  to  the  likelihood  equations.  Finally, 
comparison  on  those  methods  is  also  discussed. 
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1.  Introduction 


•  »  t  * 

An  autoregressive  process  {yt»  t  ■  0,  +1,...}  of  order  p  is  defined  by 

U.i)  f  Yiyt-i“et*  t-0,  +1,..., 

i-0  1  c  1  c 

where  yq  ■  1  and  {€t>  is  a  sequence  of  uncorrelated  random  variables  with  mean 

2  Pi 

0  and  common  variance  a  .  We  assume  that  the  roots  of  \  y.  Z »  0  are  outside 

i=0  1 

the  unit  disc.  The  process  (1.1)  is  completely  specified  by 
2 

♦  =  (y. *  •  • • »  y  »  a  )'  when  the  €  are  assumed  to  be  normally  distributed. 

M  -*■  P  t 

Throughout  this  paper  we  shall  assume  normality  of  &t. 

Usually  statistical  inference  is  based  on  a  set  of  T  consecutive  observations 
-on  y^.  Let 

(1.2)  jp  -  (y^  yT)’, 

and  let  P  be  a  permutation  matrix  such  that  jjjr,  =  (s',  m' )  ’ ,  where  is  a 

(T-m)  x  1  vector  and  m  is  an  mxl  vector,  with  the  ordering  in  s  and  m  preserved. 

Suppose  only  observations  in  £  are  available  and  those  in  m  are  missing.  Our 

goal  here  is  to  obrain  maximum  likelihood  estimates  of 

For  any  T*T  matrix  C,  let  us  define  C  ,  C  ,  C  and  C  to  be  the 
■o.  A-as  ^sm  ~ms 

(T-m)  x  (T-m),  (T-m)  *  m,  m  x  (T-m)  and  mxm  matrices,  respectively,  satisfying 


(1.3) 


/  S 


ss 


PCP' 

<v  m 


\ 


c 

•vms 


For  the  rest  of  this  paper,  let  f(y|<k)  denote  the  probability  density  function 
of  y,  f (s !-)  denote  the  probability  density  function  of  s,  f(m|s,  $)  denote  the 
conditional  probability  density  function  of  m  given  s,  log  f(y|$)  denote  the  log 
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likelihood  function  based  on  y  and  log  f(sU)  denote  the  log  likelihood  function 
based  on  s.  We  assume  that  the  maximum  likelihood  solutions  satisfy 


(1.4) 


3  log  f(s|$) 

_ ***  ** 

34> 


0. 


2.  Some  basic  results 

Assume  that  y  is  distributed  as  multivariate  normal  with  mean  0  and 

«*» 

covariance  matrix  E,  that  is,  f(yU)  is  given  by 

(2.1)  f(yU)  - -  1 -  exp  (-  7  y’  l"1  y}. 

A2^)t  |e| 

Then  Py  ■  (s',  m')'  is  distributed  as  multivariate  normal  with  mean  0  and 
covariance  matrix  PEP'.  Since  P  P*  **  I_,  where  I_  is  the  TxT  identity  matrix, 

ew  ^  <^»i.  A*  X. 

(2.2)  (P  E  P') 

aw  w  a* 


2  “1 

where  o  E  *  M,  and  M  ,  M  ,  M  and  M  are  as  defined  by  (1.3).  Also,  by 
—  ~  s^ss  ~sm  ,vis  -''mm 


(1.3),  we  get 


PIP'  « 


Therefore  from  (2.2)  and  (2.3),  it  follows  that 


(2.4) 


[Cov  (sU)]"1  -  E  -  ~r  [M  -  M  M  ~X  M  ], 
"«bs  2  "88  ~sm— mm  ~ms 

o 


(2.5) 


[Cov  (mis,  6)]_1  -  [E  -  E  E  _1  E  }-1 
*  1  ev.  jzs  ~vm  a# ms  **88  ^sn 


(2.6) 


9  JLn  * 


E[m|s,  41  “  £  E  1  s 

**  **  ~ms  ~68  a# 


-  -  M  -1  M  s, 

/vimn  -*ms  ** 


(2.7) 


P  M  P' 


M  |  M  -  M  M  M 
v>mm'  "ss  ''•stn  "thh  ~ma 


From  (2.4)  and  (2.7),  we  obtain 


(2.8) 


‘  Or?)  ”  (& 


exp  {-  s’  [H  -  M  M  -1  M  ]  s}. 
-  I  ~  ~ss  "sm  ~tnm  "ms  — 
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Expressions  (2.5)  and  (2.6)  will  be  used  in  the  following  sections.  Though  (2.8) 
gives  the  expression  for  the  probability  density  function  of  s,  we  will  not  use 
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3  log  f(s|(k) 

It  to  Obtain  the  score  function - •  due  to  the  •**pUcity  of 

3  log  f^j*) 

- - —  and  Lemma  1  in  section  3.  We  will  use  (2.8)  in  proving  the  asymptotic 

properties  of  the  estimates  in  a  subsequent  paper.  Under  suitable  conditions, 
the  estimates  of  <fr  based  on  the  Newton-Raphson  method  and  the  method  of  scoring 
are  shown  to  be  /r-m  -  consistent,  asymptotically  normal  and  one-step  asymptotically 
efficient  if  the  initial  estimates  are  /r-m  -  consistent. 

3.  Estimation 


(3.1) 


(3.2) 


Y  ■  (Y,.--.,  Y  )' 
1  p 


Yu  "  (1,Y’)' 


Then  (see  Anderson  (1971),  sec  6.2,  and  Box  and  Jenkins  (1976),  sec  7. A. 5) 


(3.3)  log  f  (y  |  ♦)  -  -  j  log  (2ir  a 2)  +  \  log  |m|  --^y'Hy 

"  ~  t  ~  2a  —  ~  ~ 

-  -  |  log  (2it  o2)  +  \  log  (h|  -  ~  Y.'  D  Y  » 


where  the  elements  m  of  the  TxT  matrix  is  given  by 


(3.4) 


"st  "  “T+l-t,  T+1-e 
min(s,t)-l 


HIXU  \Oj, 

-  3l0  YjVls-t|, 

P-Is-tl 

“  fy  Yi  V!«-t|, 


-  0  , 


■#.  ■  -i=- «*A»Se»X.v.«4-  i 


s,  t  *  1 , • . • ,  p. 


max  (s,t)  >_  p+1, 

min  (s,t)  T-p, 

| s-t |  —  0,1,...,  p, 

(s— t|  •  p+1 , • • . , 
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Proof.  The  result  follows  immediately  from 

8  log  f(m|s,4i) 

O.10>  P.f, - ---  |  s,£}  "  0. 

«• 

It  is  clear  from  (3.7)  -  (3.9)  that 

3  log  f(s|$)  1  3  log  Jm|  1 


(3. Ill 


3v. 


L°g  121  i  R  r  ,  1 

3Yj  "  o2  Jo  Yl  L1+1»i+1  1  -  J 


j  **  !■»•••*  P» 


and 


(3.12) 


8  log  f(s|$) 


8ct 


+  Y*  E  I'd  |s,  4>  1  Y 
o  *  l"  ~  ~  J  "JJ 


2a  2a 


The  term  log  ]m|  is  0(1)  (See  Hannan  (1973),  e.g.)  while  d..  is  0  (T).  The 

*»  lj  p 

effect  of  neglecting  log  |m|  is  negligible  for  moderate  or  large  T,  and  we  shall 

neglect  log  |m|  and  other  negligible  terms  henceforth.  From  (3.11)  and  (3.12), 

it  follows  that  the  likelihood  equations  are  given  by 


(3.13) 


and 


(3.14) 


j0  T*  e[Vi.j«  1  m]  ’  °-  J ' 1 . "• 


° 2  ”  t  ll  E[2  I  m!ju' 


When  there  are  no  missing  observations,  E|d  I  d  ,,  ...  does  not  involve 

L  8+1, j+1  '  *  g+1,3+1 

unknown  parameters.  Then  the  equations  are  linear  in  Y^,  i  *  1,...,  p,  and  are 

the  Yule-Walker  equations.  When  missing  observations  do  occur. 

Eld  ,  ...  |  8,  <j>  involves  unknown  parameters  and  (3.13)  and  (3.14)  are 
I.  E+i»J+1  -J 

highly  non-linear  in  the  unknown  parameters.  In  fact,  frra  (2.5)  and  (2.6), 

(.1.15)  E[Vl.J«  1  2'*] 


j+l^ss  "  ^-g+l.j+l^sm  £ 


-  K'  (A  ..  ..,)  +  K'  (A  ..  ...) 

-*  "'g+l.J+l  OS  **  •'g+ljj+l  1 
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{ 

t 

i 

> 

* 

i 


where  K  “  M _  *  M  .  and  the  matrices  Involved  are  as  defined  in  (1.3),  (3.4) 

**  ~xm  ^ms  ’ 

and  (3.6).  Therefore  solutions  of  (3.13)  and  (3.14)  are  not  straightforward 
and  Iterative  procedures  have  to  be  used. 

We  propose  the  following  three  methods  of  solving  (3-13)  and  (3.14): 
the  EM  algorithm,  the  Newton-Raphson  method  and  the  method  of  scoring. 

a.  The  EM  algorithm.  Since  i)>  is  to  be  estimated,  it  is  natural  that 

r* 

one  replace  Ep  I  s,  $"]  in  (3.13)  and  (3.14)  by  e[*  1  s,  f.1,  where  is 
some  estimated  value  of  and  obtain  <$.  .  iteratively  by  solving 

mt  m*  X  '  X 

(3*l6)  JQ  (vg*i+l  E[dg+l,j+l  ^  5»*il"  °* 

)  ■  Pi 

and 


Here  (Yg)j  and  ^enote  c**e  estimates  of  yg  and  Yy»  respectively,  at 

the  j-th  iteration.  As  shown  in  Tan  (1979),  the  above  method  gives  the  same 
solutions  as  the  EM  algorithm  proposed  by  Dempster,  Laird  and  Rubin  (1977). 


b.  The  Newton-Raphson  Method.  From  (3.11)  and  (3.12),  we  obtain 

a 

xog  tvs|<p;  . 


a2  log  f(s|*)  ,  p  a 

(3.18)  ekj  2  37“ 57;  ”2  fJn  Yg  avT  E[dg+i,j+i  J  V  *] 


+  E[dkH.,j+l 


j »  h  *  1, . . . ,  p. 
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9 


«-25>  Vi.p+i  !  EfVi,P«l  --^  +  7T  J.0  ■'i  b  °<1-3> 

'  2?  i.f-0  Tl  Y3  "  <il+1.J+1>™  M”  1' 

where  o(k)  ■  E[yt+k  yt^  *  T*1US*  t*'e  method  of  scoring  leads  to  the  following 


set  of  equations: 


(3.26) 


(ii+i  -  t±> 

/•  1 


8  log  f (s|$) 


3  log  f(s|d>) 


where  the  elements  $  of  $  are  given  by  (3.23)  -  (3.25),  and  - — - — 

lj  w  3<£ 

fW 

is  given  by  (3.11)  (without  the  first  term  on  the  right-hand  side)  and  (3.12). 

We  have  used  the  fact  that  E[di+1  ■  [t  -  (i+j)J  o(i-j),  which  can  be 

approximated  by  T  o(i-j)  for  moderate  or  large  T. 


A.  Comparison  of  the  methods  of  estimation 

The  estimates  of  ij>  based  on  the  Newton-Raphson  method,  the  method  of 

m 

scoring  and  the  EM  algorithm  can  be  expressed  in  the  following  form 


Jl?1  <}l«  -  *1> 


3  log  f  (s  [  d>) 


3*  \  * 

mA. 


where 


3  log  f  (s  j  4>) 


is  given  by  (3.11)  (without  the  first  term  on  the  right-hand 


side)  and  (3.12).  In  the  Newton-Raphson  method,  we  have 

H  -  0, 

where  0,  is  given  by  (3.18)  -  (3.21).  In  the  method  of  scoring,  we  have 
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where 


where 


Let 


and 


H  **  <t, 

0  la  given  by  (3.23)  -  (3.25).  In  the  EM  algorithm,  we  have 

H  ■  to. 


elements  uj^  of  to  are  given  by 


(0  m  to 

gj  jg 


mh  E[Vi,j+i  1  *’  j]  *  e.J  -  1 . . 

g  -  1,...,  p,  j  «  P+1 


o 

-  0, 


2a 


4  » 


g  “  P+1,  j  “  P+1- 


A  -  9  -  4>, 

B  “  oj  —  0, 

^ 

C  -  to  ~  <t. 


I  <♦> 

-*m  ^ 


2  , 


*  v  ft  1 


3<ji  3$' 


-  E 


[- 


3  2  log  f  ( s  |  d> ) 


9<j>  3<j>' 


] 


32  log  f(y|<f>) 


[a  j-  T 

-  T«  "  J  "  EC®1- 
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I  (A)  1 «  referred  to  as  the  lo'ir.  information  matrix  in  Orchard  and  Woodburv 
*-ra  #■* 

(1972).  It  follows  that 


F.(A)  "  0, 


ad  also 


F.(B) 
Urn  -~ 
T-rr«=>  " 


lira 

T-mw 


F.fC) 

T-m 


lira 


T  (*> 

^■m  ~ 

"  T-m 


since  lim 


E(oj)  ..  3  log  i  1 6) 

— --  =  lim  -  E  1 


T-m 


T-trrxo 


T— m  3$  3<J>  ’ 

I  (♦> 


(see  Box  and  Jenkins  (1976) 


section  7. A. 5).  In  general,  lim 

T-m-ra 


Mil 


T-m 


is  not  negligible.  For  example. 


when  p  =  1  in  (1.1)  and  the  process  (y  }  is  periodically  observed  for  ct  time 
points  and  then  not  observed  for  two  time  points,  it  can  be  shown  that  (see 
Tan  (1979)) 


lim 

T-m-x» 


I  (*) 

«»TQ  •* 

T-m 


2  2  2  4 

o  (l  +2yp  (3  +  -  yj) 


Yl(l  +  2yJ) 


ao 


-  Yj  U  +  ^Yx) 


2  4 

where  \  =  (1  +  +  Y^)*  It  is  easy  to  see  that  the  above  matrix  is  positive 

definite. 
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